了解人类流动性对于智慧城市和社会行为研究的发展至关重要。人类流动模型可用于许多应用,包括大流行控制,城市规划和交通管理。现有模型的预测用户移动性模式的准确性小于25%。人类运动的灵活本质可以证明低精度可能是合理的。确实,人类的日常运动并不僵化。此外,严格的移动性模型可能会导致用户记录中的隐藏规律性。因此,我们提出了一种新的观点,以研究和分析人类的迁移率模式并捕获其灵活性。通常,迁移率模式由一系列位置表示。我们建议通过将这些位置抽象成一组位置来定义移动性模式。标记这些位置将使我们能够检测到接近现实的隐藏模式。我们提出IMAP,这是一种单独的人类流动性模式可视化平台。我们的平台使用户可以根据历史记录可视化他们所访问的位置的图。此外,我们的平台显示使用修改后的前缀方法计算出的最频繁的移动性模式。
translated by 谷歌翻译
我们提出DIY-IPS - 自己动手 - 室内定位系统,这是一个开源实时室内定位移动应用程序。DIY-IPS通过使用可用WiFi接入点的双波段RSSI指纹识别来检测用户的室内位置。该应用程序可以无需额外的基础设施费用即可实时检测用户的室内位置。我们发布了我们的应用程序作为开源,以节省其他研究人员的时间来重新创建它。该应用程序使研究人员/用户能够(1)使用地面真相标签收集室内定位数据集,(2)以更高的准确性或其他研究目的自定义应用程序(3)通过用地面真相实时测试测试修改方法的准确性。我们进行了初步实验,以证明应用程序的有效性。
translated by 谷歌翻译
An important component of an automated fact-checking system is the claim check-worthiness detection system, which ranks sentences by prioritising them based on their need to be checked. Despite a body of research tackling the task, previous research has overlooked the challenging nature of identifying check-worthy claims across different topics. In this paper, we assess and quantify the challenge of detecting check-worthy claims for new, unseen topics. After highlighting the problem, we propose the AraCWA model to mitigate the performance deterioration when detecting check-worthy claims across topics. The AraCWA model enables boosting the performance for new topics by incorporating two components for few-shot learning and data augmentation. Using a publicly available dataset of Arabic tweets consisting of 14 different topics, we demonstrate that our proposed data augmentation strategy achieves substantial improvements across topics overall, where the extent of the improvement varies across topics. Further, we analyse the semantic similarities between topics, suggesting that the similarity metric could be used as a proxy to determine the difficulty level of an unseen topic prior to undertaking the task of labelling the underlying sentences.
translated by 谷歌翻译
Brain network provides important insights for the diagnosis of many brain disorders, and how to effectively model the brain structure has become one of the core issues in the domain of brain imaging analysis. Recently, various computational methods have been proposed to estimate the causal relationship (i.e., effective connectivity) between brain regions. Compared with traditional correlation-based methods, effective connectivity can provide the direction of information flow, which may provide additional information for the diagnosis of brain diseases. However, existing methods either ignore the fact that there is a temporal-lag in the information transmission across brain regions, or simply set the temporal-lag value between all brain regions to a fixed value. To overcome these issues, we design an effective temporal-lag neural network (termed ETLN) to simultaneously infer the causal relationships and the temporal-lag values between brain regions, which can be trained in an end-to-end manner. In addition, we also introduce three mechanisms to better guide the modeling of brain networks. The evaluation results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database demonstrate the effectiveness of the proposed method.
translated by 谷歌翻译
在本文中,我们得出了一种新方法来确定数据集的共享特征,通过采用联合非负矩阵分解并分析所得因素化。我们的方法使用两个数据集矩阵的联合分解$ x_1,x_2 $中的非负矩阵$ x_1 = as_1 = as_1,x_2 = as_2 $得出一个相似的度量,以确定$ x_1的共享基础的良好,x_1,x_2 $近似于每个dataset。我们还提出了基于此方法和学习分解的数据集距离度量。我们的方法能够成功地在图像和文本数据集中成功身份差异。潜在的应用包括分类,检测窃或其他操纵以及数据集之间的学习关系。
translated by 谷歌翻译
We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $\Omega(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. We then propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server. We prove that the communication cost of DisBE-LUCB matches our lower bound up to logarithmic factors. In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds. We also provide similar bounds for practical settings where the context distribution can only be estimated. Therefore, our proposed algorithm is nearly minimax optimal in terms of \emph{both regret and communication cost}. Finally, we propose DecBE-LUCB, a fully decentralized version of DisBE-LUCB, which operates without a central server, where agents share information with their \emph{immediate neighbors} through a carefully designed consensus procedure.
translated by 谷歌翻译
神经切线内核(NTK)已成为提供记忆,优化和泛化的强大工具,可保证深度神经网络。一项工作已经研究了NTK频谱的两层和深网,其中至少具有$ \ omega(n)$神经元的层,$ n $是培训样本的数量。此外,有越来越多的证据表明,只要参数数量超过样品数量,具有亚线性层宽度的深网是强大的记忆和优化器。因此,一个自然的开放问题是NTK是否在如此充满挑战的子线性设置中适应得很好。在本文中,我们以肯定的方式回答了这个问题。我们的主要技术贡献是对最小的深网的最小NTK特征值的下限,最小可能的过度参数化:参数的数量大约为$ \ omega(n)$,因此,神经元的数量仅为$ $ $ \ omega(\ sqrt {n})$。为了展示我们的NTK界限的适用性,我们为梯度下降训练提供了两个有关记忆能力和优化保证的结果。
translated by 谷歌翻译
在本文中,我们研究了具有N节点的目标两层神经网络的压缩到具有M <n节点的压缩网络中。更确切地说,我们考虑目标网络权重为I.I.D的设置。在高斯输入的假设下,次高斯次级高斯,我们最大程度地减少了目标和压缩网络的输出之间的L_2损失。通过使用高维概率的工具,我们表明,当目标网络充分过度参数化时,可以简化此非凸问题,并提供此近似值作为输入维度和N的函数。平均场限制,简化的目标以及压缩网络的最佳权重不取决于目标网络的实现,而仅取决于预期的缩放因素。此外,对于具有relu激活的网络,我们猜想通过在等缘紧密框架(ETF)上取重量来实现简化优化问题的最佳,而权重的缩放和ETF的方向取决于ETF的方向目标网络。提供数值证据以支持此猜想。
translated by 谷歌翻译
问题答案(QA)是自然语言处理中最具挑战性的最具挑战性的问题之一(NLP)。问答(QA)系统试图为给定问题产生答案。这些答案可以从非结构化或结构化文本生成。因此,QA被认为是可以用于评估文本了解系统的重要研究区域。大量的QA研究致力于英语语言,调查最先进的技术和实现最先进的结果。然而,由于阿拉伯QA中的研究努力和缺乏大型基准数据集,在阿拉伯语问答进展中的研究努力得到了很大速度的速度。最近许多预先接受的语言模型在许多阿拉伯语NLP问题中提供了高性能。在这项工作中,我们使用四个阅读理解数据集来评估阿拉伯QA的最先进的接种变压器模型,它是阿拉伯语 - 队,ArcD,AQAD和TYDIQA-GoldP数据集。我们微调并比较了Arabertv2基础模型,ArabertV0.2大型型号和ARAElectra模型的性能。在最后,我们提供了一个分析,了解和解释某些型号获得的低绩效结果。
translated by 谷歌翻译